fix(bedrock): enable websearch_interception with extended thinking on Bedrock#20489
fix(bedrock): enable websearch_interception with extended thinking on Bedrock#20489Quentin-M wants to merge 9 commits intoBerriAI:mainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile OverviewGreptile SummaryThis PR adds two enhancements to the websearch interception handler: extracting API keys from router configuration and handling thinking parameter constraints for Anthropic models. Key Changes:
Critical Issue Found:
Confidence Score: 1/5
|
| Filename | Overview |
|---|---|
| litellm/integrations/websearch_interception/handler.py | Added API key handling from router config and thinking parameter logic, but critical bug: wrong variable passed to helper functions (module instead of messages list) |
Sequence Diagram
sequenceDiagram
participant Client
participant WebSearchHandler
participant Router
participant LLM
participant SearchProvider
Client->>WebSearchHandler: Request with websearch tool
WebSearchHandler->>WebSearchHandler: Pre-request hook: convert native tools
WebSearchHandler->>LLM: Initial request
LLM-->>WebSearchHandler: Response with tool_use blocks
WebSearchHandler->>WebSearchHandler: Detect websearch tool_use
Note over WebSearchHandler,Router: Extract API keys from router config
WebSearchHandler->>Router: Get search_tools config
Router-->>WebSearchHandler: search_provider, api_key, api_base
loop For each search query
WebSearchHandler->>SearchProvider: Execute search (with api_key)
SearchProvider-->>WebSearchHandler: Search results
end
Note over WebSearchHandler: Check thinking parameter
alt thinking.budget_tokens > max_tokens
WebSearchHandler->>WebSearchHandler: Adjust max_tokens = budget_tokens + 4096
end
alt Last tool_call message has no thinking_blocks
WebSearchHandler->>WebSearchHandler: Drop thinking parameter
end
WebSearchHandler->>LLM: Follow-up request with search results
LLM-->>WebSearchHandler: Final response
WebSearchHandler-->>Client: Return final response
ef0f21a to
ff825ed
Compare
|
Hey @Quentin-M, nice PR — both fixes address real production issues and the code is clean. A couple of things to address before merge: 1. Missing
|
ff825ed to
9e243eb
Compare
9e243eb to
a8b16f5
Compare
440e33a to
e9c975c
Compare
e9c975c to
d67ca1d
Compare
d67ca1d to
c93a203
Compare
|
@greptile |
Greptile OverviewGreptile Summary
Confidence Score: 2/5
|
| Filename | Overview |
|---|---|
| litellm/constants.py | Updates BEDROCK_CONVERSE_MODELS; Opus 4.6 entry drops ':0' suffix which breaks matching against other Bedrock IDs. |
| litellm/integrations/websearch_interception/handler.py | Fixes websearch interception loop to preserve kwargs, thinking blocks, and to load search tool credentials from router config. |
| litellm/integrations/websearch_interception/transformation.py | Returns structured TransformRequestResult including tool_calls and thinking blocks; prepends thinking blocks to follow-up assistant message. |
| litellm/litellm_core_utils/core_helpers.py | Extends internal param filtering to handle prefixes and centralizes internal key lists. |
| litellm/llms/anthropic/chat/transformation.py | Adds Opus 4.6 adaptive thinking mapping and drops thinking when last assistant message lacks thinking blocks. |
| litellm/llms/bedrock/beta_headers_config.py | Introduces centralized whitelist/translation for Bedrock anthropic-beta headers with version/family gating. |
| litellm/llms/bedrock/chat/converse_transformation.py | Uses centralized beta filter; strips unsupported context_management body param; adds Opus 4.6 adaptive thinking path. |
| litellm/llms/bedrock/messages/invoke_transformations/anthropic_claude3_transformation.py | Filters/translates beta headers via centralized filter and strips context_management from body for Invoke Messages. |
| litellm/router.py | Stores resolved custom_llm_provider into deployment params to make provider visible to callbacks post-alias resolution. |
| litellm/utils.py | Adds helper to detect missing thinking blocks in last assistant message; minor BedrockModelInfo import refactor. |
Sequence Diagram
sequenceDiagram
participant Client
participant Router as litellm.Router
participant WSI as WebSearchInterceptionLogger
participant Provider as Bedrock/Anthropic
participant Search as litellm.asearch
Client->>Router: completion(model alias, messages, tools, thinking)
Router->>WSI: pre_api_call(kwargs incl. resolved custom_llm_provider)
WSI->>WSI: convert hosted web_search tool -> regular tool
WSI-->>Router: return {**kwargs, tools: converted}
Router->>Provider: initial LLM request
Provider-->>Router: response(content blocks)
Router->>WSI: async_post_call_success(response, kwargs)
WSI->>WSI: transform_request() extracts tool_use + thinking blocks
alt has websearch tool_use
WSI->>Search: asearch(query, provider, api_key/api_base from router.search_tools)
Search-->>WSI: search result(s)
WSI->>WSI: transform_response() builds assistant msg (thinking + tool_use) + user tool_result
WSI->>Provider: follow-up LLM request(max_tokens adjusted if <= thinking budget)
Provider-->>WSI: final response
else no websearch
WSI-->>Router: passthrough
end
Review FindingsThanks for the comprehensive work on enabling websearch + extended thinking on Bedrock! CI Status
Must Fix1. Lint failure: Function too longThe 2. Multiple test failures in beta headers codeSeveral new tests are failing:
The beta header filtering logic appears to have bugs in version/model matching that need investigation. 3. Missing
|
c93a203 to
18dc198
Compare
18dc198 to
ab743af
Compare
Greptile SummaryThis PR implements three main feature areas: (1) websearch interception with extended thinking support on Bedrock, (2) centralized beta header filtering for all Bedrock APIs, and (3) improved thinking param handling for assistant messages without thinking blocks. Key concerns:
What works well:
Confidence Score: 2/5
|
| Filename | Overview |
|---|---|
| litellm/constants.py | Removes :0 suffix from Claude Opus 4.6 model ID in BEDROCK_CONVERSE_MODELS. Minor change, already discussed in prior review thread. |
| litellm/integrations/websearch_interception/handler.py | Major changes: thinking block preservation through websearch agentic loop, API key/base loading from router search_tools config, max_tokens auto-adjustment for thinking budget. The max_tokens adjustment only checks type == "enabled" but not "adaptive" (minor inconsistency). |
| litellm/integrations/websearch_interception/transformation.py | Refactored to use NamedTuple return types (TransformedRequest, TransformedResponse). Added thinking block capture and preservation. Clean, well-tested changes. |
| litellm/litellm_core_utils/core_helpers.py | Extracted internal params to module-level constants (INTERNAL_PARAMS, INTERNAL_PARAMS_PREFIXES) with prefix-based filtering. Added _is_param_internal() helper. Mostly formatting changes alongside the refactor. |
| litellm/llms/anthropic/chat/transformation.py | Adds last_assistant_message_has_no_thinking_blocks check alongside existing tool_calls check to drop thinking param when assistant messages have text but no thinking blocks. |
| litellm/llms/base_llm/chat/transformation.py | Adds "adaptive" thinking type recognition in is_thinking_enabled(), supporting Opus 4.6. Small, targeted change. |
| litellm/llms/bedrock/beta_headers_config.py | New centralized module for Bedrock beta header filtering with version-based model support, family restrictions, and header translations. Well-documented and extensible design. |
| litellm/llms/bedrock/chat/converse_transformation.py | CRITICAL: Removes native structured outputs support (_supports_native_structured_outputs, _create_output_config_for_response_format, _add_additional_properties_to_schema, outputConfig handling) and renames _is_nova_2_model to _is_nova_lite_2_model (excluding Nova-2-Pro). These removals break existing tests and remove production features. |
| litellm/llms/bedrock/messages/invoke_transformations/anthropic_claude3_transformation.py | Integrates centralized beta header filter with translation support, strips context_management, removes Sonnet 4.6 patterns from interleaved thinking support. Simplifies tool search beta header logic. |
| litellm/router.py | Stores custom_llm_provider in deployment's litellm_params after alias resolution, enabling callbacks (websearch_interception) to access the resolved provider. |
| litellm/utils.py | Adds _message_has_thinking_blocks() helper and last_assistant_message_has_no_thinking_blocks() function. Refactors existing any_assistant_message_has_thinking_blocks to use shared helper. Well-tested changes. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[User Request with web_search tool] --> B{WebSearchInterceptionLogger\nasync_log_pre_api_call}
B -->|Provider not enabled| C[Pass through to LLM]
B -->|Provider enabled| D[Convert web_search to\nLiteLLM standard tool]
D --> E[LLM Response]
E --> F{async_should_run_agentic_loop}
F -->|No WebSearch tool_use| G[Return response]
F -->|WebSearch tool_use detected| H[Extract tool_calls +\nthinking_blocks]
H --> I[_execute_agentic_loop]
I --> J[Load search credentials\nfrom router search_tools]
J --> K[Execute parallel searches\nvia litellm.asearch]
K --> L[Build follow-up messages\nwith thinking + tool_result]
L --> M{max_tokens <= budget_tokens?}
M -->|Yes| N[Adjust max_tokens =\nbudget + DEFAULT_MAX_TOKENS]
M -->|No| O[Keep original max_tokens]
N --> P[anthropic_messages.acreate\nfollow-up request]
O --> P
P --> Q[Return final response]
subgraph Beta Header Filtering
R[anthropic-beta headers] --> S{BedrockBetaHeaderFilter}
S --> T[Whitelist check]
T --> U[Version-based filtering]
U --> V[Family restrictions]
V --> W[Header translation\ne.g. advanced-tool-use]
W --> X[Filtered headers to AWS]
end
Last reviewed commit: ab743af
Additional Comments (2)
The patterns for
Similarly to the interleaved thinking removal above, Sonnet 4.6 patterns were removed from However, the centralized |
32f4311 to
029bc83
Compare
When Anthropic's extended thinking is enabled, assistant messages must start with thinking blocks before tool_use blocks. The agentic loop was creating follow-up messages with only tool_use blocks, causing validation errors. This change ensures thinking blocks from the original response are preserved and included at the start of follow-up assistant messages.
- Created `TransformRequestResult` NamedTuple to capture both tool_calls and thinking_blocks from `transform_request()`, making the contract explicit and extensible
- Modified `transform_request()` to extract and return thinking/redacted_thinking blocks alongside tool calls
- Updated `transform_response()` to accept thinking_blocks and prepend them to follow-up assistant messages
- Passed thinking_blocks through the agentic loop chain: detection → execution → message transformation
- Fixed `transform_request()` to return full kwargs (not just tools) to preserve other request parameters
- Used `filter_internal_params()` utility instead of manual filtering for consistency
This change fixes websearch interception when extended thinking mode is enabled.
**Problem**: When Anthropic's extended thinking is enabled, assistant messages must start with thinking blocks before tool_use blocks. The agentic loop was creating follow-up messages with only tool_use blocks, causing the error: `messages.1.content.0.type: Expected 'thinking' or 'redacted_thinking', but found 'tool_use'`
**Solution**: Modified `transform_request()` to capture thinking/redacted_thinking blocks from the original response, and `transform_response()` to include them at the start of the assistant message in follow-up requests.
**Testing**: Successfully tested end-to-end with Claude Code → LiteLLM Proxy → AWS Bedrock → Claude Opus 4.5.
```yaml
model_list:
- model_name: claude-opus-4-5-20251101
litellm_params:
model: bedrock/us.anthropic.claude-opus-4-5-20251101-v1:0
aws_region_name: us-west-2
model_info:
supports_web_search: true
litellm_settings:
callbacks: ["websearch_interception"]
websearch_interception_params:
enabled_providers: ["bedrock"]
search_tool_name: "searxng-search"
search_tools:
- search_tool_name: searxng-search
litellm_params:
search_provider: searxng
api_base: "https://searxng.example.com"
```
**Note**: Uses `bedrock/` (not `bedrock/converse/`) to route through `anthropic_messages_handler()` which supports agentic hooks.
Fixes issue where websearch interception failed with "TAVILY_API_KEY is not set" error when using search providers that require API keys. Changes: - Extract api_key and api_base from router search_tools configuration - Pass credentials to litellm.asearch() when available - Falls back to environment variables when credentials not in config - Maintains backward compatibility with existing configurations Root cause: Handler was only extracting search_provider from router config, but not the associated api_key and api_base fields. This caused litellm.asearch() to fall back to environment variables, which failed when keys weren't set in env. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes websearch interception failures when thinking.budget_tokens is set and requests violate Anthropic's requirement: max_tokens > budget_tokens. Changes: - Validate max_tokens against thinking.budget_tokens when extended thinking is enabled - Automatically adjust max_tokens to budget_tokens + DEFAULT_MAX_TOKENS (4096) when insufficient - Follows the same pattern as base transformation classes in LiteLLM This prevents the error: "max_tokens must be greater than thinking.budget_tokens" when using extended thinking with websearch interception. Related issue: BerriAI#14194 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
029bc83 to
4365ae0
Compare
…pport Standardize anthropic-beta header handling across all Bedrock APIs (Invoke Chat, Converse, Messages) using a centralized whitelist-based filter with version-based model support. - Inconsistent filtering: Invoke Chat used whitelist (safe), Converse/Messages used blacklist (allows unsupported headers through) - Production risk: unsupported headers could cause AWS API errors - Maintenance burden: adding new Claude models required updating multiple hardcoded lists - Centralized BedrockBetaHeaderFilter with whitelist approach - Version-based filtering (e.g., "requires 4.5+") instead of model lists - Family restrictions (opus/sonnet/haiku) when needed - Automatic header translation for backward compatibility - Add `litellm/llms/bedrock/beta_headers_config.py` - BedrockBetaHeaderFilter class - Whitelist of 11 supported beta headers - Version/family restriction logic - Debug logging support - Invoke Chat: Replace local whitelist with centralized filter - Converse: Remove blacklist (30 lines), use whitelist filter - Messages: Remove complex filter (55 lines), preserve translation - Add `tests/test_litellm/llms/bedrock/test_beta_headers_config.py` - 40+ unit tests for filter logic - Extend `tests/test_litellm/llms/bedrock/test_anthropic_beta_support.py` - 13 integration tests for API transformations - Verify filtering, version restrictions, translations - Add `litellm/llms/bedrock/README.md` - Maintenance guide for adding new headers/models - Enhanced module docstrings with examples - Production safety: only whitelisted headers reach AWS - Zero maintenance for new Claude models (Opus 5, Sonnet 5, etc.) - Consistent filtering across all 3 APIs - Preserved backward compatibility (advanced-tool-use translation) ```bash poetry run pytest tests/test_litellm/llms/bedrock/ -v ``` Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…ock APIs Bedrock doesn't support context_management as a request body parameter. The feature is enabled via the anthropic-beta header (context-management-2025-06-27) which was already handled correctly. Leaving context_management in the body causes: "context_management: Extra inputs are not permitted" Strip the parameter from all 3 Bedrock API paths: - Invoke Messages API - Invoke Chat API - Converse API (additionalModelRequestFields) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…without thinking blocks Follow-up to a494503f4b which fixed thinking + tool_use. That fix only detected missing thinking blocks on assistant messages with tool_calls. When the last assistant message has plain text content (no tool_calls), the check returned False and thinking was not dropped, causing: "Expected thinking or redacted_thinking, but found text" Add last_assistant_message_has_no_thinking_blocks() to detect any assistant message with content but no thinking blocks. Extract shared _message_has_thinking_blocks() helper that checks both the thinking_blocks field and content array for thinking/redacted_thinking blocks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Upstream only checks for type="enabled" but Opus 4.6 uses type="adaptive". Without this fix, max_tokens auto-adjustment doesn't trigger for adaptive thinking, causing API errors.
4365ae0 to
2350657
Compare
fix bedrock pii redaction null value handling
|
@ryangoldblatt-bm to sign the CLAs 🙏 |
Done sir, shall we reopen this? |
Summary
Rebased on
BerriAI/litellmmain (Feb 18, 2026) with the following fixes on top of cherry-picked PR #20488:Websearch Interception
api_key/api_basefrom router'ssearch_toolsconfig (fixes "TAVILY_API_KEY is not set")max_tokenswhen<= thinking.budget_tokens(Anthropic requiresmax_tokens > budget_tokens)Bedrock
context_managementfrom request body for all Bedrock APIs (Invoke Messages, Invoke Chat, Converse)Thinking
adaptivethinking type inis_thinking_enabled(Opus 4.6)Test Plan
🤖 Generated with Claude Code